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We consider the model Zi = Xi-\- Si, for i.i.d. XiS and e^'s and independent sequences (Xi)igpj 
and (£i)igN- The density of ei is assumed to be known, whereas the one of X\, denoted by 
g, is unknown. Our aim is to estimate linear functionals of g, {tp,g) for a known function ip. 
We propose a general estimator of {tp,g) and study the rate of convergence of its quadratic 
risk as a function of the smoothness of p, fs and tp. Different contexts with dependent data, 
such as stochastic volatihty and AutoRegressive Conditionally Heteroskedastic models, are also 
considered. An estimator which is adaptive to the smoothness of unknown g is then proposed, 
following a method studied by Laurent et al. (Preprint (2006)) in the Gaussian white noise model. 
We give upper bounds and asymptotic lower bounds of the quadratic risk of this estimator. The 
results are applied to adaptive pointwise deconvolution, in which context losses in the adaptive 
rates are shown to be optimal in the minimax sense. They are also applied in the context of the 
stochastic volatility model. 

Keywords: adaptive density estimation; ARCH models; deconvolution; linear functionals; 
model selection; penalized contrast; stochastic volatility model 

1. Introduction 

We consider the convolution model 

Z,^X,+e,. (1) 

The sequences {Xi)i^jq and (ei)i6N are independent sequences of real valued random 
variables. The Xi are i.i.d. with unknown density g, the Si are i.i.d. with known density 
/g. The Fourier transform of a function u <E L^(R) is denoted by u*{x) = J e^^*u{t)dt. 
The smoothness of is described by parameters 7, a, p in the following assumption: 

There exist non-negative numbers kqi i^'q, 1, ol and p such that /* satisfies 
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(2) 

Ko{x' + l)-^^^cxp{~a\x\P} < \f:{x)\ < k',{x' + l)-^/2cxp{-a|xr}, 

with 7 > 1 when p = 0. If cither a = or p = 0, we set (a, p) ~ (0, 0). Since is known, 
the constants a,p, kojKq and 7 defined in (2) are known. 

When p = in (2), the errors are called ordinary smooth errors. When a > and 
p > 0, they are called supersmooth. The standard examples for supersmooth densities are 
Gaussian or Cauchy distributions (supersmooth of order 7 = 0, p = 2 and 7 = 0, p = 1, 
respectively). An example of an ordinary smooth density is the Laplace distribution 
(p = = a and 7 = 2). 

In this context, many papers have studied the deconvolution problem. Many different 
strategics have been developed in order to estimate the distribution g of the unobserved 
Xi, when g is assumed to belong to some smoothness class defined by 



5(&,a,r,L) = |g such that J \g* {x)\^{x^ + if exp{2a\x\''} dx < 2nL^ , (3) 



where b,a,r,L are some unknown non-negative numbers, such that > 1/2 when r = 0. 
If either a = or r = 0, we set (a,r) = (0,0) and we say that the density is ordinary 
smooth. When both a, r > 0, we call the density supersmooth. 

In this paper, we are interested in the problem of estimating 8{g) = {ipjg) ~ K{ip(Xi)) 
in model (1), where ^ is a known integrablc function with respect to the probability 
measure associated to g. To study the rates of convergence of our estimators, we have to 
take into account the smoothness of the function "0. Thus "0 is assumed to satisfy: 

VxeR \il;*{x)\^<C^,{x^ + l)-^exp{-2A\x\^). (4) 

The parameters A and R are non-negative real numbers, and B is non-negative or such 
that 4'* 9* is integrablc. In particular, they can be zero if g* is integrablc. We work under 
the convention that if either ^ = or i? — 0, then we set {A, R) = (0, 0). 

We exhibit the whole range of the rates of convergence for estimators of the functional 
0{g), depending on the parameters in (2)-(4). To the best of our knowledge, this general 
rate description is new. We also extend the result to different dependency contexts, in 
view of applications to particular hidden Markov models or AutoRegressive Conditionally 
Heteroskcdastic-typc models. 

The upper bounds for the rates follow from a squared-bias/ variance compromise. To 
obtain this compromise, we have to choose a smoothing parameter which depends on 
unknown quantities. Therefore, a data driven model selection type procedure is proposed. 
It is based on minimization of a penalized estimated criterion, which is different from 
the one intensively studied for mean integrated squared errors. The difhculty here lies 
in finding an adequate criterion for the setting of a linear functional and mean squared 
error. The proposed procedure is inspired by Laurent et al. [24]. We give upper bounds 
for this adaptive method, with particular interest in the cases where a loss in the rate 
appeared with respect to the non-adaptive estimator. 
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In the particular case of pointwise estimation, adaptive estimation in the direct prob- 
lem (i.e., when the Xi are observed without noise) has been widely studied in the context 
of the Gaussian white noise and regression models, see, for example, Lepski [26], Tsy- 
bakov [28], Cai and Low [7, 8] (for more general linear functionals), Artiles and Levit [2], 
Laurent et al. [24] and, in the context of density models, Lepski and Levit [25], Bu- 
tucea [3] and Artiles [1]. For the model of Gaussian sequences Golubev and Levit [21] 
and Golubev [20] considered adaptive estimation of linear functionals in both direct and 
inverse problems. In the Gaussian white noise model Goldcnshluger [18] and Golden- 
shluger and Pereverzev [19] considered pointwise estimation for the inverse problem on 
classes of functions similar to 5(6, 0,0, L). Their adaptive procedure is based on Lepski's 
procedure. Note also that in some particular inverse problems the pointwise adaptive 
estimation was solved by Klemela and Tsybakov [22] for the Riesz transform and by 
Cavalier [10] for the tomography problem. To the best of our knowledge, we present the 
first work on adaptive estimation of general functionals of the form J -0(7 in the context 
of indirect observation (1). 

We do not study optimality in the very general case: this would be very technical. But 
we study as a first application the particular case of pointwise density deconvolution. 
This case corresponds to ?A*(<) = e'*^°, which satisfies (4), meaning that we can choose 
if) as the Dirac measure at xq. This makes sense in our problem because the definition of 
our estimator involves only ■!/;*. We recover in this particular case the upper bound rates 
obtained by Fan [16], Gator [9], Butucea [4] and Butucea and Tsybakov [6]. Moreover, 
we prove the optimality in the minimax sense of the loss due to adaptation for Sobolev 
smooth and supersmooth densities in the presence of ordinary smooth noise and for 
supersmooth densities in the presence of supersmooth noise with r > p and < p < 1 
(in the case r < p no loss occurs, while the case r>p and 1 < p < 2 is still open). As a 
by-product we also prove in the last case that the rate of our estimator is optimal in the 
minimax sense, which was not yet known in the literature. 

Our estimation method is also illustrated for the discrete stochastic volatility model, 
where derivatives of the Laplace transform of the volatility arc estimated with good rates. 

The plan of the paper is as follows: In Section 2 we define the estimators and we 
compute upper bounds for their mean squared error. In Section 3 the adaptive proce- 
dure is detailed. Both independent and /3-mixing contexts are studied. In Section 4, two 
applications of our general results are detailed. Section 4.1 shows the application of the 
results to adaptive pointwise deconvolution, upper bounds are deduced from Section 3 
and the associated lower bounds are proven when a loss occurs. Section 4.2 presents an 
application to the context of the stochastic volatility model. Most proofs are gathered in 
Section 5. 

2. Risk bound for the estimator 

We denote by (•,•) the L^-scalar product {{u,v) = J u{x)v{x) dx), by ★ the convolution 
product of functions {u*v(x) = J u{t)v{t — x) dt) and by u* the Fourier transform of 
u e Li (M) : u* (x) = J e'*^w(i) dt. 
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Recall that we want to estimate 9{g) = {tlj,g) = E(-!/'(Xi)) where Xi follows model (1) 
and is unobserved. Only the Zi, for i = 1, . . . ,n are available. In what follows we assume 
that 

■ipg, il) and V'*5'* belong to Li(M), , , 

fe belongs to L2 (M) and is such that V.t e K f; {x) 7^ 0. 

Note that the square integrability of requires that 7 > 1/2 when p = in (2). 

Moreover, we generalize these results to distributions having Fourier transform such 
that J '0*5* < 00. For example, we estimate g{x{)) for some fixed xq when we take -0 = S^^ , 
the Dirac measure at xq, having Fourier transform equal to il)*{t) = e~'*^°. For estimating 
the derivatives g('^)(xo), when they exist, we consider ip such that = (— ii)'^e~'*^". 

2.1. The estimator 



We write, using (5), {i>,g) = (l/27T)(^*,g*) = {l/2n){r , Tzl ft)- Replacing /^(t) by its 

tie estimatoi 



empirical version (l/^i) e'*'^'' leads to the estimator 



— E 



e 



2nn^J f*{t) 



■ dt. (6) 



This estimator is explicit and seems attractive. Unfortunately, the integral diverges for 
many choices of /*; for instance, e is a Gaussian noise. To overcome such issues, we 
suggest regularization and take the following estimator of 0{g): 



I e-^^^dt (7) 



where m is an integer. 



Remark 2.1. Let denote the projection estimator of g defined in Comte et al. [14]. 
Then we can prove that 6„i = (gm, 0)- 



2.2. Risk bounds and rates for i.i.d. variables X^'s 

If (5) holds and if, moreover, '0*(~-)//e integrable, then m = +00 can be chosen and 
the estimator 9 = 9oo is unbiased and has a parametric rate. 

Otherwise, we have E{9~9^)^ ^ b'^{9,„)+Ya.r{9„,) with b{9„-,) =9-E{9,„). As E(^„) = 
(2^)"' I\t\<nm 9* {t)r dt, we obtain 

bidm) [9* m* i~t) dt - [ g* [tW i~t) 

27t \J J\t\<nm 

(8) 

g*{t)rht)dt. 

I\t\>7 



= hl g*it)rht)dt. 

J\t\>nni 
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Under (5), b(9m) tends to when m grows to infinity. 
For the variance term, write 

Var(^,„) = -V Var ( H e'"^^ du 



First, the following bound holds 



Var(^„0 < ^ 



Next, the variance can also be bounded as follows 



1 //— \r{-n)\ ' 

-47t2nU-™„ l/e*HI " 



47t2n f* {u)f* i-v) 

We use the Cauchy-Schwarz inequality and Fubini's theorem: 

1 



Var(0„) < 



47^2 J 



TTm /'TXin 



ri-u) ' 



Tim nnra 



r{v) 



Tim- J — Tim- 



fz{u — v) \ dudw 

_l/2 

\J*z{u~v)\diuAv 



Note that since -0 is a real valued function we have !?/'*(— 1)| = \ip*{t)\. As J \fz{x) \ dx < 
1 1/^(2^)1 dx, we have the following result: 

Proposition 2.1. Assume that ~ J \ f* {x)\dx < +oo, and let 9m be defined by (7). 
Then, under (5), 

n9-9rn?<Uz \g*{t)r{t)\dt] +-—min\cj ^ ' ' 1^ 



27T7|t|>7tm / An^n \ 7_TO„ l/eP \J-nm\f( 

Note that we also have / 1/2(2;) | dx < H/^ ||||,9*|| = 27T||/e|| j|g|| , if fe and g arc square 
integrable. 

Remark 2.2. If, in addition, 

''|0*(x)//;(x)pdx<+oo, (9) 
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then the variance of is of order 1 /n and the estimator can reach the parametric rate, 
for 771 large enough. Note that a condition hke J \tp*{x)/f*{x)\dx < oo (which ensures 
that (6) is well defined) is generally stronger than (9) as convergence problems lie only 
near infinity. Moreover, such conditions arc fulfilled if tjj* decreases faster than /* near 
infinity, which corresponds to the intuitive idea that ip is a, smoother function than /g. 
For example, this happens if tjj is supcrsmooth and is ordinary smooth. 

Thus we can study the rates that can be deduced from the upper bounds of Proposition 
2.1, as a function of the smoothness parameters of the three functions involved, g, '0, f^- 
To do so, let us assume that ^ satisfies (4), that g belongs to S{b,a,r,L) as defined by 
(3) and that /; fulfills (2). Then, use (8), (3) and (4) to get 



6'(0™ ) < 



.2 

5*(x)|(l+x2)^/2exp(a|xr)(|7/.*(x)|(l + a;2)-''/2cxp(-a|a;r))dx 

|2;| >7rm 



< / \g*{x)\'{l + xyexp{2a\x\^)dx 

' I a; I > Ttm 

|7/;*(a;)|2(l + a;2)-''exp(-2a|a;r)da; 

\x\'>TCm 

<LC I {l + x^)-^-^cy.j>{^2a\xY ~2A\xf')dx 

J |a:| >7Tm 

< Ci7n-2^-2S-max(nfl)+i exp(-2a(7T7r7)'^ - 2A(Tim)^). 

On the other hand, the noise plays an important role on the variance of the estimator: 

• Case (I): If (p = i? = 0,7 < B - 1/2) or (p = i? > 0,a = A,7 < B - 1/2) or {p = R, 
a<A)oT{p< R), then Var(^„) < C'n.-^; 

• Case (II): If (p = i? = 0, 7 = S - 1/2) or (p ^ i? > 0, a = A, 7 = B - 1/2), then 
Var(^,„) < C"ln(7?7)77-i; 

• Case (III): If (p = i? = 0, 7 > B - 1/2) or {p = R>0,a = A,-f > B - 1/2), then 
Var(^„)<C"7n2T'-2^+i7i^i; 

• Case (IV): If {p>R) ot {p = R> 0,a > A), then 

Var(^„) < c"7i-i7772^"2S+i-''+(i-'')+e2"("™)''-2'4("™)«. 

We summarize in Table 1 the scenarios that arise when one minimizes over 771 the sum 
of the upper bounds on the bias and the variance. Let a V 6 = max{a, b}. Note that in 
cases (8) and (9) the rate is given by 

Vn - min(cs7T7-2''-2B + l-r.Vflg-2a(n™)' ^2A(nm)« 



. ^27-2B+l-p+(l-p)+(,2a(7tm)''-2A(7tm)«l I /-j^q-j 

n 
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Table 1. Upper bounds for the minimax rates of convergence, 5i = (27 — 2B + l)/{r V K\, 
52 = {b + B- 1/2) /{b + 7) and ^3 = (2(b + B) - l)/p 



Parameters 










Rates 


Adaptive rates 


p<R 










(1) n-i 




p = R, a<A 




7<S- 


1/2 




(2) n-i 
(3) 




{p = R = o) 

or 

{p = R>0, 
a = A) 

{p = R>0, 
a> A) 


< 


7 = B- 
7>-B- 


1/2 < 
1/2 < 


r V i? > 
r V i? = 

r V i? > 
rVi? = 


(4) (Inlnn)ri-^ 

(5) (Inn)n-i 

(6) (lnn)''in~' 
(7) 

(8) in (10) 


(Inln(n) In n)''in"^ 
(n/ln(n))-*2 

t;„(ln(n))'^SO <54 < 1 


p>R 






(r\/R>0 
\rV R = 


(9) v„ in (10) 

(10) ln(n)-*3, 


?;„(ln(n))^*,0<54 < 1 



These rates are strictly faster than {\n{n))^^^ , that is, Vn ~ o((ln(7i))^^i )) for any Ai > 0, 
and generally slower than n^^'^ , A2 > (negative powers of n can be obtained). For precise 
(but cumbersome) formulae in similar cases, we refer to Lacour [23]. We give in Section 
5 the orders of the m associated to the rates. 

Remark 2. 3. Different optimal choices of m depend on the unknown parameters related 
to g (see Section 5.1), hence the interest in an automatic selection procedure for m. 

2.3. Extension to mixing contexts 

In view of applications, it is natural to study the robustness of our method when the 
variables Xi are /3-mixing. To be more precise, two dependence contexts are considered. 

(Dl) In Model (1), the sequences {Xi) and (e^) arc independent and the e,; are i.i.d. 
The sequence {Xi) is strongly stationary and /3-mixing. with /3-mixing coefficients 
denoted by (/?fc)fc. 

(D2) In Model (1). the £i are i.i.d. and, for any given i, Xi and are independent 
(but the sequences {Xi) and (e^) are not independent). The sequence {Zi,Xi)i^j^ is 
strongly stationary and /3-mixing, with /3-mixing coefficients denoted by {(3k)k- 

Context (Dl) encompasses the case of particular hidden Markov models, when the 
noise is additive and {Xi) is a /3-mixing Markov process. As many Markov chain models 
or other standard models can be proved to have such mixing properties (see Doukhan 
[15] for a large number of examples and study of their mixing properties), this means 
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that our results can be applied to many classical models. In that case, we can prove the 
following result: 

Proposition 2.2. Consider the model (1) under (Dl) with moreover J2k>ol^h < +oo. 
Assume that Ce = J da; < +oo. Let dm be defined by (7). Then 

no-o.nf<{^l \9*{t)r{t)\dt) 

n 

In particular, if := J < +oo, then the last term in the right-hand side of 

(11) is of order 0(l/n). Moreover, in any case, we have in (11), 

\t\<7Xra ) ' ^ i —TX-m 

so that the last term is always less than or equal to the variance term. It follows that the 
rates, in the context of mixing Xk described by assumption (Dl), remain the same as in 
the independent setting. 

Context (D2) is related to ARCH models. Indeed, general ARCH models can be for- 
mulated as follows: Let (rji) be an i.i.d. noise sequence. 

Yi^airji with CT, =F(77i_i,77i_2,...), (12) 

for some measurable functions F, or 

Yi = a^rji with = F((Ti_ i,77i_i) and (To independent of {ri^)t>a. (13) 

Many examples can be found in the literature, and conditions can be given under which 
the process {Yi,ai)i^z is geometrically /3-mixing; we refer to Comte et al. [12] for a review 
of the examples and the references therein. Clearly then, Zi = ln(l^^), Xi = In(tT^) and 
£i = ^n{T]f) follow model (1) and satisfy conditions given by (D2). We can prove the 
following result in this context: 

Proposition 2.3. Consider the model (1) under (D2) with moreover '^^^Q^k < +oo. 
Assume that Ce = J \fe{x)\ dx < +oo. Let 9^ be defined by (7). Then 

no-emf<(^ I \9*{t)r{t)\Ai)^ 
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Thus, the procedure attains the rates of the independent case as soon as, for some 
constant C, 

This does not hold in general, but in particular cases. For instance, if satisfies (2) and 
if -0 satisfies (4) together with 

\r{x)? > C'^{x^ + l)-«exp(-2A|.T|«), (15) 

with either 7 > max(i?, 1) or (A > 0, p > 0), then, under the assumptions of Proposition 
2.3, 

where is a constant. 

It follows from (16) that the rates given in Table 1 are preserved in this /3-mixing 
context whenever the Si are supersmooth. 

Taking i!*{t) = e'*^" for any xq (as in Section 4.1 below) allows one to provide a point- 
wise density estimator that retains the rate of the independent case if 7 > 1 . We recover 
the results obtained by the kernel estimator of van Es et al. [29]. Our results are more 
general since van Es et al. [29] only consider a multiplicative Gaussian noise (implying 
supersmooth e^, see Section 4.2) and do not study adaptation (which is not useful in 
their particular case). Other functional {ip,g) may be estimated with our procedure. 

3. Adaptive estimation 

Now, we provide a strategy leading to an automatic choice of m. Note that such model 
selection has an interest only in the case / |V'*//e I = +0° ^i^d / lV'*//e*l^ = +00 since 
otherwise the variance is of order 1/n and the rate is parametric. As tp and are assumed 
to be known, these conditions can be explicitly checked. 

Let us describe briefly the heuristics that follow Laurent et al. [24]. Let 9m = IE(6'm) = 
{2n)^^ J^™_^g*{t)^*{—t)dt. The approximation of the bias of {6(g) — 9mY is obtained 

by replacing it by {6j — 6,n)^ for j > m, j great enough, and then by {9j — 9m)^- This 
approximation in turn introduces a bias which must be corrected (see H{j,m,) below). 
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The variance term is replaced by a penalty function pen(-) from N into M+. This gives 
the theoretical criterion 

Crit(m) = sn-p{9j — 9mY + pcn(m), 

where pen(m) has the order of the variance term (see Section 2.2) and its empirical 
version is 

Critim) — sup [{6m - Oj)^ - H{j, m)] + pen(TO), 

j>m,jGM 

where H{j,m) is an additional bias correction and is a subset of N. Then, we can 
define 



m = mil m E A4, Critim) < inf Crit(j)^ — 



(17) 



as the model selection procedure. It remains to find pen(-) and H{j,m) that make the 
procedure work and give good rates for 9,%- 

Recall that = J \ f^{x) \ dx. Let Xm, be some positive weights to be chosen, and let 
a > 0. We define: 



pen(m) 4 ( 1 + ^ ) (a;,„cr^ + xf^cf,^) 



(18) 



where cr^ = m' Cm. = co,m, with cr^„j and cj^m defined by 



27m 



Tc{jAni)<\x\<Tc{j\/ni) 



rix) 



dx. 



and 



Let also 



2nn 



\rix)\ 

n{j Am)<\x\<n{j\/ m) 

r{x) 



7z{j /\m)<\x\<7i{j\/ ni) 



1 



dx. 



H{j, m) = 4 ( 1 + - ) {xjalm + ^Ara)- 



(19) 



We shall prove the following theorem: 



Theorem 3.1. Consider model (1) where (^i)i<i<n o,nd (£i)i<i<n o,re independent se- 
quences of i.i.d. random variables and assume that (5) is fulfilled. Let dm be defined by 
(7) and (17)~(19) when / j?/)*//* | = +c» and / = +c». Then there exists some 
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positive constant C(a) depending only on some a > 0, such that 

IE[(em - 61)2] < C(a) inf [( f {x)g* {x)\dx] +pen(?7i)l 

I \J\x\>nm J J 

C 

meM 

where ui'^^ = V c„i + 2(ct,^ V Cm)^ and C is a constant. 

Theorem 3.1 states that 9,% leads to an automatic tradeoff between the squared bias 
term (Jj^|>^^ (x)| dx)^ and pen(m), if a;,„ are chosen so that X^m'^'^^^m = 

0(l/ri). However, as the main term in pen(TO) is clearly x,„(t,'^„. where (t,'^„ is the variance 
of 6m, Xm represents a loss in the variance (not necessarily in the rate). 

Now, let us discuss the possible choices for Xm in order to see what loss occurs, if 
any, when using the adaptive procedure. We assume that 6 + i? > 1, so that we can 
take = {1, 2, . . . , [-y/n]}, where [-y/n] is the greatest integer less than ^/n. The possible 
choices for .t™ are discussed with respect to the upper bounds on the variance given in 
Section 2.2: 

• Case (II): We take Xm = 21n(m) and the rate becomes of order (lnln(n))2/n instead 
of lnln(n)/n or of order \n^{n)/n instead of ln(n)/n. 

• Case (III): We take Xm = (27 — 2B + 3)ln(n) and the rate becomes of order 
Inln(n) ln'^(ri)/n instead of ln*(n)/n and of order {n/ \TL{n))~^^^'^^^~^/'^^/^^^^^ in- 
stead of n-[(''+s)-i/2]/(;>+7). 

• Case (IV): We take Xm = Aa{TXm)P . There is no loss in case (10) if p > 0, r = i? — 0. 
In the two other cases, (8) and (9), a loss in the variance occurs. If the bias is 
dominating (if r > p), there is no loss in the rate. Otherwise, as the optimal m 
is less than (ln(n)/C)"'^/'', for some C > 0, the loss in the rate is at most of order 
0(ln(n)). Note that the rate being faster than logarithmic in this case, the loss 
remains negligible with respect to the rate. 

The adaptive rates are given in the last column of Table 1. Let us emphasize that the 
rates presented in both the second and third columns of Table 1 are new in such a general 
setup. 

Moreover, if we want to extend the adaptive result to the mixing case, we can use the 
Bernstein inequality given in Doukhan [15] or in Butucea and Neumann [5] provided that 
the mixing is geometrical. We can prove the following corollary of Theorem 3.1: 

Corollary 3.1. Consider model (1) under (Dl) or under (D2) with fe satisfying (2) and 
ip satisfying (4) and ( 15) with either 7 > max(i?, 1) or A, p> 0, and assume in both cases 
that 13k < c-"'' for any k e N. Then if (5) is fulfilled and if ^\i'*{t)\dt < +00, = 
+00 and /IV"*//;^!^ = +00, the result of Theorem 3.1 for 9m defined in the same way 
holds with Cm, cj^m replaced by 2c„iln(7i)/c, 2cj_m ln(n)/c and Um, '^jm multiplied by 2. 
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Clearly, the constant c appearing in the Cm, Cm,j is unknown, but these terms have in 
general negligible orders when compared to the cr^, cr|m- In that case, these terms can 
be omitted in the definition of the estimator and the procedure does not depend on the 
mixing coefficients (see the example in Section 4.2). 

4. Applications 

4.1. Pointwise estimation 

Pointwise estimation of <?, also called pointwise deconvolution, is a particular case of our 
general setting and the most studied example in the literature. In this section, we give a 
full description of minimax and adaptive rates. 

We check that our estimation procedure attains the minimax and adaptive rates (when 
known) in this context and that it provides the rates for the other setups. Very few results 
are available on the optimality of the rates in the adaptive setup and we prove here such 
results. 

Let A = [b,b] x [a,a] x [r,r] x [L,L] C [0,oo) x [0,oo) x (0,2] x (0,oo) be a set of pa- 
rameters A = (b, a, r, L) . We shall denote by the minimax rate of convergence over the 
class 5(A); see, for example, Butucea [4] for a definition. We shall say that an estimator 
is adaptive minimax over the family of classes 5(A), A G A, if it attains the minimax rate 
ipn uniformly in A. 

It is not always possible to attain the minimax rate uniformly over a set of parameters 
A. It may happen that there is a loss in the rate due to adaptation, see Lepski [26]. We 
shall say that an estimator is adaptive for the adaptive rate 0„ if it attains this rate 
uniformly in A over A and if, moreover, the lower bounds hold for this rate uniformly in 
A over A. For a definition, see Butucea [3]. 

For pointwise estimation of g, we can take il^{x) = S^^^y^x) for any given xq, where 
S{xo} is the Dirac measure at xq. This implies ip*{t) = e'*^° and |V'*(i)l = 1- Therefore, 
the rates correspond to the particular case B = A = i? = in (4) and in Table 1 . They 
are summarized more simply in Table 2. Our procedures attain the rates already found 
in pointwise deconvolution and cover all other previously unknown setups. 

When r > 0, p > 0, the value of m„ is not explicitly given. It is obtained as the solution 
of the equation 

^2b+2-f+{i-p)+ exp{2a(7Tm„)'' + 2ainm„y} = 0(n). (20) 

Consequently, the rate of g,„^ is not easy to give explicitly and depends on the ratio r/p. 
lir/p or p/r belongs to ]fc/(fc + 1); (fc + l)/(fc + 2)] with integer fc, the rate of convergence 
can be expressed as a function of k. For explicit formulae for the rates, see Lacour [23]. 

These rates are known to be optimal in the minimax sense as indicated in Table 2. 
The case r = is studied by Fan [16], the case r = 0,p > by Cator [9] and the case 
r > 0, p = by Butucea [4]. The rate in the case r > 0, p > 0, 7 = is proven optimal in 
the minimax sense in Butucea and Tsybakov [6] for r < p. By using their construction 
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Table 2. Choice of m„ for pointwise deconvolution and corresponding rates under assumptions 
(2) and (3). Adaptive rates for comparison. Bm is abbreviated for m~^'+^~'' exp(— 2a(7rm)^) 
and Vm for m2^+i-''+'^-''>+ exp(2a(7rm)'')/n 



ordinary smooth 



p>0 

supersmooth 



r = 
Sob.(&) 



minimax rate (Fan [16]) 
,^^=0((n/ln(n))-(2''-i)/(2f+27)) 
adaptive rate (NEW) 



7rm„ = [In(n)/(2Q + l)]i/'' 
^^ = 0((ln(n))-(2^-i)/'') 
minimax rate (Fan [16]) 

0^ = O((ln(n))-(2^-i)'''') 

adaptive minimax rate (no loss) (Cator [9]) 



r>0 



In(n)/2b] 



l/r 



:0(1 



minimax rate (Butucea [3]) 



^2 ^ Q^ lnln(n)ln(n)t''"' + ^/'- 

adaptive rate (NEW) 



m„ solution of (20) 

= ln(n) - (lnln(n))2 
'fin = 0(-Bm„ ) : minimax rate if r < p 

(Butucea and Tsybakov [6]) 
(fi^ = 0(V,n„) : minimax rate if r > p, 

p< l(NEW) 



= 0(m 



pl(r>p) 2 



adaptive minimax rate if r < p (no loss) 
^ (Butucea and Tsybakov [6]) 
adaptive rate if r > p, p < 1 (NEW) 



and by following the same proof, we get near optimality (within a log factor) in the case 
r > p. 

Very few results on adaptive pointwise estimation are available. We use |V'*(a;)| = 1 
in the procedure described in Section 3, with Cm < and xf^c^ < CxraCfm fo'^ 
the choices of Xm that will be found. Clearly, if is ordinary smooth, the choice a;,„ = 
(27 + 3) ln(m) suits and if is supersmooth, we can choose Xm = Aa{nmY . These choices 
coincide with the general case detailed above for B = Q. Then we have X^meM ^~^"^^'m — 
C/n. This implies that 

- ^ c .„^(£^ + -»..{/^ {jjnr) }) + ^- 

The rates still correspond to the particular case B = A = R = Om (4) which arc summa- 
rized in Table 2. 

Let us mention that in the cases p > 0, a > and r > 0, a > (i.e., both and g are 
supersmooth). then Xm is of order m^. There is no loss due to adaptation if r < p as 
noticed earlier by Butucea and Tsybakov [6], but, surprisingly, we notice a loss of order 
[In(n)]''/'' if r > p associated to a rate faster than any power of logarithm. If r = p, the 
loss is logarithmic and the rate polynomial. 

The previously defined estimator 9m„ with m„ defined in Table 2 is adaptive minimax 
in the cases: (r = and p > 0) and {r > 0, p > and r < p). As we already noticed, 
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estimators 0m, which are free of parameters, may attain a slower rate of convergence </>„, 
that is, it may happen that (p„ = o((/)„). Therefore, we check that the loss with respect 
to the minimax rate, when it occurs, is unavoidable. 

Theorem 4.1. The rates defined in Table 2 are adaptive rates and when either p — 
or (r > p> and p <1) the loss with respect to the minimax rate which appears (compare 
in Table 2, (pf^ and (jy^) is optimal, that is, it satisfies the following lower bounds: 

infsup sup (j>-'%[\0n-0{g)\^]>c 

^" AgAc,g5(A) 

for n large enough, where the infimum is taken over all possible estimators 0^, under the 
additional hypothesis that the noise density is three-times continuously differentiable and 

for polynomial noise \f'^{u)\ < C-j — pj^i l^^l ~* oo (21) 

for exponential noise \f^{u)\ < C\uY''^^ cxp{~a\u\'^), as\u\~^oo. (22) 

Moreover, when r > 0, r > p and < p <1 the rate ip'^ is the minimax rate of estimation. 

Remark 4-1 • Note that the adaptive property of 0rh in the case r > p is proved only for 
p < 1, which is a technical restriction. Nevertheless, it is worth noticing that, still under 
the restriction that p < 1, we obtain as a by-product in Theorem 4.1 the minimaxity of 
the rate for r > p. This is a new result since the latest result on the subject was proving 
minimaxity in the case r < p only (see Butucea and Tsybakov [6] ) . 

4.2. Stochastic volatility model 

In this section, we consider the discrete time stochastic volatility model. Let 77,; be an i.i.d. 
centered noise process, ^{rjf) = 1 and let Vi be a sequence of positive random variables. 
Assume that we observe ?7i , . . . , C/„ , where 

Ut^\/\^'i]i, i = l,...,?i. (23) 

Then the conditional variance of Ui given Vi equals Vi which explains that Vi is called the 
volatility process. In many contexts, this process is the process of interest. We assume 
moreover, (Vi) and (77^) are independent and (Vi) is a stationary /3-mixing process with 
/^-mixing coefficients denoted by {(3k)- When this model is obtained as the discretization 
of a set of continuous time stochastic differential equations, Vi is indeed geometrically 
/3-mixing, and 77^ ~ A/'(0, 1); see Comte and Genon-Catalot [13]. 

Model (23) is also considered in this form by van Es et al. [29] among others, under 
the assumption 77; ~ A/'(0, 1). Setting 



Z,^\n(Uf), X,=\ii{V) and £. = ^77^) 
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allows us to write (23) in the form (1). Then, we note that if rji ^ J\f{0, 1), 

/;(a;) = -^r(l + ia;) and |/;(z)| ^|,|_+^ v^c-"^/^, (24) 

by using the Stirling formula T{z) ^|z|^+oo V27xz^~^/^e^^ . We recognize (2) with 7 = 0, 
a = n/2 and p—1- 

Applying the results of Section 4.1 in the mixing context (Dl) (see Proposition 2.2 
and Corollary 3.1), we deduce that, if V is geometrically /3-mixing, we have a pointwise 
estimator of g, 



for which we can propose an automatic selection of m which reaches the adaptive or 
adaptive minimax rate. The resulting rate is a negative power of ln(n) if g is in a Sobolev 
space but it is much faster if g is supersmooth (a case which is easy to meet; see the 
examples in Comte and Genon-Catalot [13]). Therefore, we recover as a particular case, 
and substantially improve the result of van Es et al. [29], who propose a non-adaptive 
kernel estimator of 5, assuming that g is known to be twice continuously differentiable. 

Now, extensions of the class of discrete time stochastic volatility models have been 
studied (see Gcnon-Catalot and Kessler [17] or Chaleyat-Maurel and Gcnon-Catalot [11]) 
and, in particular, it is natural to consider more general types of distributions for 77. 
For instance, we suppose now that rf' follows a Gamma distribution, that is, {x) = 
(e~^x^~'^ /T{p))lx>^. In that case, we find 

= and |/;(^)|^|,|^^^^^^|^|P"V2e-.|.|/2^ (35) 

that is, e is supersmooth with 7 = p — 1/2, a = 7t/2 and p = 1 in (2). The Gaussian case 
corresponds to p = 1/2. Let us recall that the Laplace transform Lu of a real valued 
function u is defined by Lu{x) = J e~^^u(t) dt as soon as it exists, and the Laplace 
transform of a non-negative random value Y is defined by E(e~'*'^). In this context, let 
7T denote the density of Vi , and consider that we are interested in estimating the Laplace 
transform of Vi. In fact, our general method provides an estimator of h{X) = — (i7T)'(A) = 
E(Vie^'^^i), that is, minus the derivative of the Laplace transform of n. In other words, 
we can estimate h{X) = (tpx^g) =E{Vie^^'^^) = E(e'^i"^'=''^' ). Actually we have, for A > 0, 

h{X) = (i'x,g), with VA(a;)=e^-^°^ 

and 

rx{x) = A-i-'-r(l + ix) ~|,|^+oo ^^e-"l^l/^ (26) 
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(i.e., A = n/2 and i? = 1 in (4)). Let us define 

h^niX)-^j:f e^'^'^dt (27) 

with /* and tp'^ given by (25) and (26). Then, taking into account the orders of /* and 
Tpx, we obtain, by applying inequahty (11) of Proposition 2.2 and if p 7^ 3/2: 

E[{h„,{X) - h{X)f] < Xmc-" " + ^ + , 

n n 

where K, K' and K" are positive constants, K" = 2{J |^/'*|)^. If p = 3/2, the variance 
term has order \n{m)/n. Then notice that (Dl) is satisfied in our model. Therefore, we 
get 

Proposition 4.1. Consider model (23) with (Dl), (25) and (26). Assume that {Xk) = 
(In(Vfc)) is 13-mixing with 'YlikPk < +00, then km defined by (21) satisfies, for A > 0, 

E[{Kn{X) ~ h{X)f] 

n n ' 

where K , K' and K" are positive constants. 

In other words, using Table 1 we obtain a rate of order [ln(7i)]'^~^^-'^^/n (i.e., always 
less than ln'^(n)/n), whatever the smoothness of 5 is. 

No adaptation is required if p > 3/2. If p < 3/2, the risk of the adaptive estimator is 
obtained by applying Corollary 3.1 and by choosing x^. = 41n(m): 

Proposition 4.2. Consider the stochastic volatility model (23) with (T)l), (25) and 
(26). Assume that {Xi) is geometrically (3-mixing and consider hm defined by (27), with 
rh defined by (17). For any A > 0, andp< 3/2 



E[{hn{X) - h{X)f] 



<K inf 



2 

'iu)rxiu)\du' 

>7Tm 

(m3-2pip<3/2 + ln(m)Ip=3/2) In(TO) 



,ln(n) 



This corresponds to the case where a loss of order ln(ln(7i)) occurs with respect to the 
non-adaptive rate. 
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Remark 4-2. The Gaussian case, for p=l/2 is not especially studied here because 
another strategy is available. Indeed for rj^N{0,l), E(e'^'^i ) = E[E(e'^^2ATT'?i |^^)] = 
'E{e~^^^). Therefore the Laplace transform of tt, Ln{\) can be directly estimated by 
an empirical mean of the exp(i-\/2A?7fc), which is an unbiased estimator reaching the 
parametric rate \/n. The rate would be the same for estimating h, as by differentiating, 

/i(A) = E(Fie-^^0 = (-i/\/2A)E([/ic'^^0- 

The method above reaches for p = 1/2, the rate ln"'(n) ln(ln(n))/n, where 1 < w < 2. 
Therefore, it is not optimal for any p. But the last strategy here exploits an additional 
assumption {rj is Gaussian) which the general methods do not take into account. 

5. Proofs 

5.1. Selected m for Table 1 

The squared bias variance compromise is performed via the following choices of m, de- 
noted by m„, in the cases enumerated in Table 1: 

(1) (2) and (3) (a) Case rVi? = 0, m„ = 0(ni/(2f.+2B-i)) as 26 - 1 >0 when r = 0. 
(b) Case r V i? > 0, 7tm„ (ln(ri)/C)i/(''^-"' for some C>A + a. 

(4) Optimal m„ is such that 2a(7tm„)'" + 2A(7Tm„)^ = In(ri) — (2fe + 2i? — 1) ln(m„). 
Take, 

e.g., 7tm„ = (ln(7i)/C)^/(''^-'^) with sufficiently large C > 0. 

(5) Take m„ = 0{n^'^^^+^B-^^). 

(6) Optimal m„ is such that 2a{nm,nY + 2A{nmn)^ = ln(n) — (26 + 27) In('Tiri), which 
gives 7tTO„ = (ln(n)/(2a) - .4/a(ln(n)/(2a))^/'' - (6 + 7)/(ar) lnln(n))i/'' \ir>R 

and 

exchange R and r in the last expression if i? > r. For an easier choice, take, for 
example, 

TO„ = (ln(n)/C)^/(''^'''' for C > large enough. 

(7) m„ = 0(711/(26+27))^ 6 + 7 > 0. (8) and (9) already discussed. 

(10) The optimal m„ is nmn = (ln(n)/(2a) - (6 + 7)/(ap) lnln(n))i/''. For a simpler 
form 

it is sufficient to take, for example, 7Tm„ = (ln(7T,)/(4a))"'^/''. 

The parameters a,b,r of the unknown function appear several times to select m„. As g 
is unknown, and thus a,b,r are unknown, it is not possible to select m„ in all the cases 
where the rate is slower than the parametric rate n~^. 
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5.2. Proof of Theorem 3.1 

We insert here general weights xj^m such that 

H{j,m) = 4^1 + (xj- ™cr|_„ + xl„^cl„,). 

We define 

5(m) = [9m - 0{g)]'^ + cr^,, + sup a;j,m(T|,„ 

j<m 

and 

TOopt = inf |m e M, Crit{m) < inf^ Crit{l) + ""!■• 
It is sufficient to prove the following theorem: 

Theorem 5.1. There exists some positive constant C(a) depending only on a, such that 
E[i§fn - 0)^] < C(a)(C7rzf(TOopt) + S(mopt)) 



\meA4 j>map 



g =^3. -"opt ^2 ^ i. I 
■' n I 



where = ct^ V c„, + 2(ct^ V c„0^ = al„,^^^ V Cj>, + 2{al„,^^^ V Cj,™)^. 

First, note that Theorem 5.1 implies Theorem 3.1. Indeed, note that for j >m, we 
have ^ < cr| and Cm.j < Cj. Therefore, choosing Xm,j = xj implies that 



e "'"wi 



Moreover Crit{m) < (/|^|^^^ |7/>*(x)g*(x)| dx)^ + pen(TO) and ^{m) < {Ji^i>„„,\ip* {x) x 
g*{x)\ dx)^ + 2pen(m). This implies Theorem 3.1. 
Now we establish the following lemma: 

Lemma 5.1. For all m E M := {I, . . . , m„}, for all x > 0, 

Crit{m) > (1 + a.) Crit (m) + ill + -](x + x 



a 
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Proof. Recall that the Bernstein inequality for a sum Sn = X]fc=i ^fe of i-i-'^- random 
variables Yk having var(li) < and |jyi||oo < 1/a states that 

P( (Sn - MSn))ln > J2uv^/n + — ) <exp(-u). 
\ an / 

We put for j >m 

Y,^Y,{j,m)^l- [ ^''"jM'^t. (28) 

Then S'„/ri = 9j — 9m and E(5„/n) = E(^j — 6'm) = 6j — 9,n- Moreover, we obtain that 
v'^/n< cr|,„ and l/(an) = Cj^m- It follows that 



¥{[{9, - 9m) - {9j - 9m)V > {<yj,mV2u + c^.^uX} < 2e-^ 

Now, from the simple fact that {x + y)^ < (1 + l/a)a;^ + (1 + a)y^ for any real numbers 
X, y, we deduce by setting u = y and v = x + y that {v — u)^ > (1/(1 + l/a))w^ — (1 + 
a)/(l + l/a)w2. Use also the fact that (A + B)'^ < 2{A'^ + B'^) for any real numbers A, B, 
to obtain 

¥{{§, - 9mf > (1 + a)(0, - 9mf + 2(1 + l/&){2almU + cf™^^)} < Se"". 
Now we set u = Xj^m + x/{crj m V Cj_m) and we find 

- 9m)^ - H{j, m) > (1 + a)(0j - e™)^ + 4|^i + (3. + ,,2) 

< 2e~^^''"e~^/^'^?-'"^'^^'''"^ 
To conclude we write 



'(^CHt{m) > (1 + a)Crit{m) + 4(^1 + -^ {x + x"^) 



< P\ 3j > 771, J e M, {9j - 9mf - H{j, m) > (1 + a)(0, ~ 9^)'' + 4.1 1 + - ] {x + x') 



a 



<2 Q-^J.mQ-x/(al^WC,,^)^ 

j>m,jSiM 

This ends the proof of Lemma 5.1. □ 
Now we follow the steps of the proof of Laurent et al. [24] . 

• We first consider the case where m < mopt • Following the same lines of proof, we get 



\{em - 0{g)f > (1 + a) (7rit (mopt) + 4 ( 1 + i 
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+ sup H{mopt,j) + - 0{g)f + 1 n {m < Wopt} ) (29) 



< ^ e ^J'"opte" 

j>mopt 



Now we consider the case m > mopt . We apply the Bernstein inequahty to 



27T7|t|<Trm /e* W 

in the same way as in Lemma 5.1. We obtain, for all m G A^, 

^(5))' > (1 + a)(0™ - 6{g)f + a(^ + -^{x + x^) + pcn(m) 

< 2e-'^'"e-''/(°'"'''='"^ 
This implies that 

- ^(.9))' > (1 + a)(0(5™) - e{g)f + a{i + -^{x^ x") + pen(m) 
< ^ 2e-""c 



^^^-a:m„-a;/(<Tj^Vc,„) 



As supj>,„[(0m — 6jY — m)] > {6m — ^m)^ — H{m,m) — 0, wc have Crit{m) > 
pen(m). Using the inequalities pen(m) < Crit{rh) < Crit{mopt) + 1/"-, we obtain 



'm - 0{g)) > (1 + a)(e™ - 0{g)r + A[l + -]{x + x')+ Cnt{mopt) + - 
< ^ 2e"'='"e~''/('^"^'='"). 



If m > TOopt, then (0„i — 6{g))'^ < sup^-^^^^^ — O(g))'^ and wc apply Lemma 5.1 with 
m = mopt • This yields 

n^~e{g)f>{l + S.)( sup (0^-_0(5)f+8fl + iVx + .T2) 

\j>mopt V ay 

+ (1 + a)Crii(mopt) + - n {m > mopt} 
n 

< ^ 2c-^'"e-^/('^^^^") + ^ 2c-^-'"°p'e""/^"-'"opt^^-"=Pt). (30) 

mGA-l J>niopt 
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Let 



2 

Cmopt =3(l + a)Cn<(mopt)+2 sup H{m,o-pt,3) + {I + &) sup {Oj - 6{g)) +- 

j<mopt j>mopt 



and 



X = {e,n - e{g))\ Y = 2(^™„^, - e{g)f . 



It follows from (29) and (30) that, for all x > 0, 



X-y >C,„„p, +24( 1 + i )(xVx2) 



Wc write that E(X) = E(XIx>y+c„„,,, ) + E(XIx<y+c„„p, ) < E[(X - F - + 

E(y + C„„pJ. 

Then, setting Ca = 24(1 + 1/a) and Z = X - F - C^^^^ 

E[Z+]^ J V{Z>t)At = cAj V[Z>Ce,u)Au + j V{Z>C^u)du 



j V{Z>C^iu\/u^))du + 2 J r{Z>C^{v\/v^))vdv], 



E[{X -Y- C,„„^J+] < a ^ 2c—{<yl V + 2{<jI V c™)^) 

- a ^ 2e-^-"opt (^2 ^^^^^ V c,- + 2(^1, ,„^^^ V c,,,„„pj2) 



= cJy 26-^"^^+ E 



^J,"opt^2 

"^J.m-opt 



j>mo 

The end of the proof is the same as in Laurent et al. [24] . 
5.3. Proof of Proposition 2.2 

The same decomposition of the risk and upper bound for the bias hold, as in Sec- 
tion 2.2. Only the variance has to be re-examined. The basic idea is that, for k ^ I, 
cov(e"^^e'''^'!) = /;(t)/;'(-s)cov(e"^^e'"^0 by conditioning on {Xk,Xt). The ad- 
ditional trick is the standard covariance inequality for /3- mixing variables (see, e.g.. 
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Doukhan [15]), which impHes that |cov(e'*^%e"''^'!)| < P\k-e\ 

pTCm pTCm 



Var(^„J = — ^ ^ / / cov(e"^^e-^0^*WV'*(-s)dsd^ 

^ 7, u_Lrj <J —Tim J —Tcm 

(31) 



The last term is the standard variance term of the independent case. The first one is 
bounded in modulus by 

^ / / |cov(e'*-^Se'"^^-'=)||V'*(i)V'*(-s)|dsdt 

J,. ^ «^ „ J —Tcni J —Tim 



l<A;<^<n' 



This gives the result. 



5.4. Proof of Proposition 2.3 

Under (D2), we only obtain that for k < £, cov(e'*^^ = /* (-s) cov(e'*^^ e'^-^O 
by conditioning on (Xi). The covariance inequality for /3-mixing variables (see, e.g., 
Doukhan [15]) still applies (but to the variables {Xk,Zk) and {X{,Ze) and implies that 
|cov(e'*'^'= , e'*"^* ) I < P\k-e\ ■ Then (31) remains true but leads, for the bound of the modulus 
of the last term, to: 

2 



V 

l<k<i<n 



n m 'J — 7Tm 



1 



fe=i 



cov(e' 



)l 



nit) 



feit) 

dt 



ds dt 



This gives inequality (14). 

For the proof of (16), the result follows from the inequality 



\rmdt 



\r/f:mdt\< 



\r/f:mdt 



and the fact that the new mixing term is always negligible with respect to the independent 
variance term if e is supcrsmooth (case A, p > 0). If £ is ordinary smooth, then we only 
have to study when m'^~-^+^)++'''^^+^ is less than m'^^~^^^^ , which occurs as soon as 
7 > max(i3, 1). 
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5.5. Proof of Corollary 3.1 

The main difference with respect to the proof of Theorem 3.1 hes in the Bernstein in- 
equahty which must be written in the mixing context. For geometrically mixing variables 
(and q = qn = 21n(n)/c if f3k < e"'^'^), we get from Theorem 4, page 36 in Doukhan [15] 
that 

\ n \l n can J n^ ' 

with ||Yi||oo<l/a and (l/g)Var (^^^ F^) < i)^. 

In all cases, \M. \ < n, so that summing up the residuals of order 1/n^ will give negligible 
terms of order 1/n. Next, the variables are still given by (28) with Cj^m and c,„ the 
same as previously multiplied by 21n(ri,)/c. This gives Cj^m = (21n(n)/2)cj_m and Cm = 
(2 ln(n)/c)cm. At last, it follows from the above computation of Var(0m) that the new 
variance terms denoted by ff^^, can be bounded under (Dl) by 

^Im < 4™ + ^Y.^ f \r(t)\dt)\ 

k>l V-'"(™^j)<l*l<"(™Vj) / 

and analogously for a^. It follows from our set of assumptions that CT|^<(T|^ + c/n< 
2cr|„j and ct^ < 2a^. The case (D2) is analogous under the given more restrictive as- 
sumptions. The Corollary 3.1 follows. 



5.6. Proof of Theorem 4.1 

We describe first the general procedure for proving the theorem and postpone details of 
constructions and proofs to Section 5.6. As the adaptation loss is different according to 
whether r = or r 0, respectively p = or p ^ 0, explicit constructions are needed for 
each of the following setups: (1) 0, p = 0; (2) 6 = 0, r > 0, p = 0; (3) 6 = 0, r > 0, < 
p < 1 and r > p. We take 6 = without loss of generality, in order to simplify polynomial 
factors in our explicit constructions. 

Typically, we construct two probability densities go e '5(A) and gi^„ G '5(A) where 
A, A S A. Moreover 

gi.nix) = go{x) + G{x — XQ,m) for m = to„ oo with n and J G{-,m) = Vm. 
Note that the likelihoods of the model become /(f = go * under go and 

flni^) = [gi.n * fe]{^) = f^i^) + [G(-, m) ★ /,] [x - X„) 

under gi,n- Then 

infsup sup cj,-lEg[\er,-9{g)\^] 

^" AeAge5(A) 
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> inf max{0-iE,„ [|0„ - 0(.9o)l1, CaE^. ^ [K - 

> inf max{g2Eg„[r2],E,,_„[|T„ - G(0, m)/0„,^|2]}, 

where g„ = (t>n,x/4'n a °° when n ^ oo, with a proper choice of A, A and T„ = (0„ — 
e(5o))/0„,A. 

From now on wc denote Pq = Fg,, , Eg = Eg„ and Pi = Fg^ ^ , Ei = Kg^ ^ . FoUowing 
Theorem 6 in Tsybakov [28] we can deduce that, if |G(0,to)/(/)„_a| > c > and if for some 
fixed < e < 1 and r > 

(32) 



dPi 



then 



infmax{g2Eo[r„2],Ei[|r„ ~ G(0, m)/0„aP]} > T^f^f^i 2 ■ (33) 

If we can choose r = t„ such that T„g^ — > oo with n, then the bound from below in (33) 
tends to c^(l — e)^ so it will be larger than c^(l — e)^ > for n large enough. Note also that 
(33) may provide the exact asymptotic constant in case 1 and Pi(dPo/dPi > t„) 1 
as oo. 

In order to deal with (32), we proceed as follows: 

Z^Li Zun - nEi(Zi,n) ^ ln(T) - nEi(Zi.„) \ 
(nVari(Zi,„))i/2 " (nVari(Zi,„))i/V ' 

where = ln(l — [G(- — xq) * fs\0^i) / gi,n * /e(^i)) form a triangular array of indepen- 
dent variables. Denote 

(nVari(Zi,„))i/2 ' 

We shall prove, for each setup, Lyapunov's central limit theorem for t/„. Moreover, we 
give a lower bound Ei(Zi_„) > — CeK„ and an upper bound for Vari(Zi_„) < c^Kn, where 
Kn is such that 

X (.90*/e,ffl,n */e) / 7 < 
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as n — *■ 00. Choose then t„ ^ such that 

ln(r„) 



(ct,nK„)i/2 
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with n, giving that Pi(?7„ > m„) > 1 — e for some < e < 1 and large enough n and thus 
concluding the proof of the theorem. 

Now, we study in more detail the different cases. 

(1) Case r = 0, p = and A= [6,&] x [L,L] C (1/2, cx)) x (0,cx)). 

Let us choose in the class 5(6, L/2) such that go > and 50(2;) > c|x|~^ as \x\ 00. 
We choose next the function G such that G{x, m) = m~-+^/^G(TOx) and with G* at least 
three-times continuously differentiable having the property 

/(l/2<|u|<3/4) ^ /(1/4<|^|<1) 
c(l + - ^ - c(l + u2&) • 

Here, m= (coln(n)/n)-i/(2fc+27)_ j^^^^ ^-j^^^^ q*(^q^ ^ / G = 0. First, gi,„ is a positive 
function with an integral equal to 1 and it belongs to S{b,L). Indeed, for each fixed x 
we have G{x, m) when n — > 00 and as G* is three times continuously differentiable 
that means |G(x,m)| < 0(|x|~^) = 0(^0(2;)) as |x| — > 00, giving that gi^n > for n large 
enough. Moreover, 



1/2 

-m^^-i/2| / \G*{u/m)\^\u\^^du 

'l/4<|M|/ni<l 



c vyi/4 {i+u^'-f 



for c > large enough. Second, 



G(0,m) 



1 r ^"^+1/2 /.3/4 

^)-i^-b+i/2_L G*{u)du>%- / dli > Ci • Co > 0. 



'1/2 

We shall prove that (32) holds with t = n"(27+i)/(2fc+27) and together with the fact that 

= (ln(7l))-(27+l)(fe-&)/((26+27)(26+27))^(27+l)/(26+27)(6+7)/(6+7) 

tends to infinity, with n, the proof of (33) and hence of the theorem is finished. 
We can prove that for each xq 
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therefore /i^„(a;) = /,f (a;)(l + o(l)), where o(l) ^ 0, n — *■ oo uniformly in x. As we chose 
g >0 then /(f > and together with the previous statement it means that for any M > 
we can find a constant C2 > such that > l/c2 on [— M, M]. Moreover, for some M > 
large enough, sec Butucea and Tsybakov [6], f§{x) = go * /e (a-') > C2/x^, as \x\ > M. 

Therefore, for large enough M > 0, fini^) ^ l/(c3|a;p), for some constant C3 > and 
for |a;| > M. Finally, we deal with 



2frZ rz , -2b+i f [Girn{-~x„))-kfeY{x) 



< f C2 / [G(m(. - xo)) * fe]\x)dx 

\ J\x\<M 

+ C3 f \x\^[G{m{-~x„))*f,f{x)dx], 

J\x\>M 



say Ti and T2, for some fixed, large M > 0. Then 



Ti < m 



-26-1 

27t 



G* 



du 



-2b-l 



-Vd.<C5m-^--<C6^^ 



(35) 



./4 



For 72 we follow the similar proof in Butucea and Tsybakov [6] and use condition (21) 
to get 



2n 



d f 1 



Ou\m \m 



du 



(36) 



Therefore, from (35) and (36) we have x^ifo : fi,n) — ^n, with re„ = c^co ln(n)/n. We use 
the fact that —u{\ + u) < ln(l — u) < —u for all u S [0, 1/2] and that (34) implies that 
|m| = |[G(m(- -xo))*/e](a;)|//i^„(a;) < 1/2 for n large enough to get 

Er[^.„] = /ln(l - MllZlig^) 

> - / [G(;m) . A] ix-xo)dx-[ iG{-M^m---o) 

>'X\foJl,n)>-'^n, 
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for n large enough. Indeed, note that J G{-,m) = and therefore J[G{-,m) -k fg][x 
xo) dx = 0. Moreover, 



Var 



r(^r.) < MZl J = / h.^ (l - '""^''"^^^^ff!)""'"^ ) fti^) 



< 



[G(-,m)*/,]2(x--x-o)/ , [G(-,m)*/e]2(x-xo) 



as by (34): sup^ \ fo (x) / fin{x)\ is bounded from above by some constant depending only 
on go and /g. By similar calculations, we also check that 



Vari(Zi,„)>iEi(Z2„) = i 



In^ 1 



[G{-,m)kf^]{x~xa) 



fLix) 



fiAx)dx 



> 



> 



1 f[Gi;m)kf,]^ix-xo) 



dx 



[G{-, m) ★ fef{x - Xo) da; > c^k„ 



and that 



■Ei(Z,,„) 



^ nEi|Zi,„|4 ^ n/[G(-,m)^A]4(x-xo)dx(l + o(l)) 
nc/ |G*(w,m)/;(u)P du(/ |G*(u,to)/,*H| du)^ 



< 



ln(n) • m-2b-27+i 

< c——!- — 2 = o(l). 

In (n) 

as n — > oo and since 5 > 1/2. Next we apply Lyapunov's central limit theorem for trian- 
gular arrays, see Pctrov [27], to get Pi(?7„ > u„) > 1 — e, as, when n +oo. 



ln(T) + At„ -(27+l)/(26 + 27) + CxCo /— - 
> u„ = , = , vln(n) 



-OO. 



(2) Case a, r > and p = 0. Without loss of generality we consider 6 = 0. 
In this case, take some a G [a, a] and go belonging to S{a,r, L/2) such that po > and 
ffo(a^) > c|x|~^ as \x\ — *■ oo. Let us consider a function G as for the case 1 such that G* is 
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three-times continuously differentiable having the property 

I{n/2 < \u\ < 371/4) ^ ^ ^ /(7t/4 < |^.| < n) 



C(l - ' ' - c{l+U^) 



Next, pi^„(x) = go{x) + \J cq Inlnn/ nm'^^^ ^'^ G{m{x — xq)), where m is such that 

coi^^m^T+i^-i exp{2a{nmy-) < 2nL/2. (37) 
n 

Note that this gives a first-order approximation of m = (logn/(2Q.))^/-. Then, similarly 
to the case 1, gi^n is a proper density function as soon as n is large enough and for some 
A/ > we have 'fi^„{x) = 5i,„ * f^{x) > C\x\~^ for all |a;| > M. 

By using (37), we get that gi^n belongs to S{a,r,L) for any a> a. Next, \gi^n{xo) — 
9oixo)\/4'n.a,r = co|G(0)| > and we get, in the same way as for case 1, 



< co^^m^^+ici / [G(™(- - xo)) * /e]'(a:) d.T(l + o(l)) 



n 

In In n 

< CgCy =: K„. 

n 

Let us choose Cq small such that cqc^ < (r — r)(27 + l)/(rr) and let ^ and r be defined 

by 

CoCx < C < -zr^(27 + 1) and r = ln(n)"^. 

On the one hand, this implies rq^ —f oo with n. On the other hand, after checking again 
that Lyapunov's central limit theorem holds in this case we get 



L(dPo/dPi > r) > Pi([/„ > u„) > 1 - e. 



as Un = (— ln(T) + riK„)(c„nK„)^-'-/^ = (— ^ + coc^){cyCoC^)~^^^ \/ln\n{n) ~oo. 

(3) Case r>0,0<p<l and r e [r,r] such that r > p. Without loss of generality we 
consider 6 = 0. 

As in the second case, take some a G [a, a] and go belonging to S{a,r,L/2) such that 
go > and go{x) > c|x|~^ as \x\ oo. Let also G be a function such that G* is three-times 
continuously differentiable with a bounded first derivative and having the property 

I{n/2 < \u\ < 37t/4) < G*(u) < /(7t/4 < \u\ < n). 

Next, define gi n via its Fourier transform 

9l,n{n)^g*o{u) + CO ^m''-i/2e2"l«l''G*(|z.r - (7r™)'')e'"^°, 
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where m is the solution of the equation 

2a{'Km.)-+2a{nm)P ^\ogn- (loglogn)^. (38) 

We stress the fact that m is no longer a scaling parameter of the function G in this 
construction. 

Again, as previously, we can check that gi^n is a proper probability density, as soon as 
n is large enough, and that for some Af > we have fin{x) > C|a;|^^ for all |a;| > M. 
Let us check that gi ^ belongs to S{a,r,L). It is enough to bound from above 

{2nn)-^ J cge-2"("'")'"m2''"ic4"l«l''|G*(|u|''- (7Tm)'')|^e2^l"l-du 

< (27tn)-ic2m2''-ie-2"(™")'' [ e4"l«l^+2iil"Fdw 

J n/4<\u\p-{nm)P<3n/4 

< c2c2n-im2''-re2s("")-+2"("™)'', 
which tends to when 7n is defined by (38). Next, 



2'K^fn y7t/2<|u|P-(7tm)P<7t 

pa(7Tm)'' 
) e 



2n^ 

and we can check similarly to Butucca and Tsybakov [6] that for m solution of (38) this 
sequence is equivalent to ((>„,, a.r when n-^ oo. Finally 



<4\ / [igi,n-9o)'*^fe?{x)dx+ / a;2[(5i_„-c,o)*/e]^(a;)dx [>, 

lJ\x\<M J\x\>M 

say T1+T2. Then 

Ti < c^C4n-ie-2«("™)''m2''-iy"|G*(|u|''- (7Tm)'')/;(M)|'du 

< c2c57i-ie"2«("™)''m2''-i / e^^l^l^dT. = clceinm)P/n. 

Jn/4<\u\P-{nm)P<3n/4 
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Moreover, under the additional assumption (22) that \df*{u)/du\ < 0(l)|u|'' ^ exp(— a|u|'') 
as ImI — *■ oo, 



±[G*i\u\P^inmr)f:iu)]' 



du 



< C8n-ie-2«("™)''m2''-i / |w|2(P-i)e2"l«l'' du < c^^^ = o(ri), 

Jn/4<\u\p-{mn)P<3n/i 

for p < 1 and n large enough. Thus X^ifo ^ fin) ^ CQC^{nm)P / n k„. 

Let Co be small such that CqC^ < 2a and let ^ and r be defined by CqC^ < ^ < 2a and r = 
g-e(ni„(„)/(2a))'>/i:^ j^^^^ T02_„ J</,2 __ > (ln(n))^exp((-e + 2a)(ln(?i)/(2a))''/i: + 
i?(ln(7i))'-^) tends to infinity for some real numbers A, B, C, as C < p/r and ^ < 2a. 
We check that Lyapunov's theorem holds and that 

- ln(T) + riKn -^{n\n{n)/{2a))P/^ + clcJ'Km)P 

with n, as m defined by (38) is larger than (ln(ri,)/(2a))^/-. 

The proof that (pn is the minimax rate of estimation in this case repeats the proof of 
(3) with modified choice of gi^n via its Fourier transform 

p— Q(7Tm)'' 

.g*„(u)=.g*(u)+co m^P-^y^c^"\<G*{\u\P~{nmY)e'^-\ 

V"- 

where m is the solution of equation (38). 

This gives the rate |(7i,„(x-o) — 170(2^0)1 > coC3m~(''~-'^^/2e"(7tm) J ^J^^ which is equivalent 
to Vm for n large enough and nx^{fo , /£„) < c§C6 + cgm^''"^ < clc^. Thus, the rate iy9„ 
is a minimax rate of convergence for r>p, p<l. 
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